Sustainability through a gender lens: The extent to which research on UN Sustainable Development Goals includes sex and gender consideration

Through efforts of the Gender Summits and UN Women, it is evident that all United Nations (UN) Sustainable Development Goals (SDGs) targets must be viewed from a gender perspective to ensure that the outcomes benefit women and men equally. Our research focuses on the extent to which sex and gender topics are explicitly covered in research related to the SDGs. Expanding on previous studies, we have developed an approach to detect and visualize the volume and proportion of research publications that include explicit mention of sex and gender terms. The approach visualizes the topical coverage of the publications in the corpus of each SDG as a term map, and overlays that view with the proportion of the publications associated with sex and gender topics. We show that attention to sex and gender topics is uneven across the SDGs, and that even where overlap between an SDG and consideration of sex and gender is high, significant topical areas of relevance to the SDG have little explicit connection with sex and gender. This study lays the groundwork for the evidence-based development of a roadmap toward greater integration of sex and gender across all SDGs as well as monitoring integration progress over time.


Introduction
In recent years, there has been growing recognition of the benefits of incorporating sex and gender analysis into research, with calls for this dimension to be considered from the research design stage [1,2]. By doing so, research questions will be answered more comprehensively and the research itself will be more robust and reproducible [3].
It has also become evident, particularly through a report by UN Women [4] and discussions and work presented at the Gender Summits held since 2011 [5,6], and previous research [7], that the targets for the United Nations' (UN) Sustainable Development Goals (SDGs) [8] must be viewed from a gender perspective to ensure that gender-responsive policies and accountability processes are developed and the outcomes to achieve the goals benefit women and men equally. This is also highlighted by the UN principle of 'Leave No One Behind' [9]. Indeed, progress on the goals can only be achieved through action plans that incorporate this consideration [10]. The World Health Organization (WHO) has also called for more nuanced consideration of gender: a broader view of the concept [4] and a deeper understanding of the reasons behind gender differences do exist [11]. Attention to sex and gender has increased in many areas of life in recent years, but this important dimension is still often missing from published research. This is especially the case where the first and last authors of publications are not women [12], and it has been demonstrated in recent findings from an Elsevier report that women authors are less commonly last authors than men [13]. Although the significance of the first and last author position varies, in some fields, such as molecular biology, the last author position is usually reserved for the principal investigator or equivalent [14]-an author who will steer the overall research project. Taken all together, this suggests that the situation is unlikely to change without much greater attention on this issue in research institutions.
Several of the SDGs are written with a recognition of the role of sex and gender in achieving outcomes and are specified in their targets and the indicators used to measure progress. But while the UN has recognized a need for "systematic mainstreaming of a gender perspective" [15], this is not the case across all 17 goals: one UN Women report identified cases where targets that included a gender dimension were lacking that dimension in the monitoring indicator [4] and another reported insufficient data to enable comprehensive tracking of progress [16]. With SDG outcomes fixed to a 2030 target date, building an understanding of how sex and gender are embedded within the research supporting the SDGs is an immediate imperative to support evidence-based implementation of the agenda and ensure that the impacts of the SDGs in gender can be robustly assessed. To establish an understanding based on published research studies, we have developed an approach to detect and visualize the volume and proportion of research publications that include explicit mention of sex and/or gender topics. In the rest of this paper we will outline this approach and highlight some key findings.

Methods
Expanding on previous studies that investigated gender in research from a topical perspective using the Scopus database [17], we have developed a keyword search-based approach to identify publications that explicitly include terms related to sex and/or gender topical research in the title, abstract or keywords. These publications are matched to the corpus of publications reflecting research related to each of 16 SDGs (excluding SDG 17: Partnership for the Goals) that have been defined on the basis of expert-informed Scopus keyword searches augmented with machine learning [18].
As an important point of departure from previous work, in this study we consider sex and gender together. Following the WHO's definitions, sex "refers to the biological characteristics that define humans as male or female," and gender "refers to the socially constructed norms, roles and relations of and among women, men, boys and girls," as well as the "expressions and identities of women, men, boys, girls and gender-diverse people" [19]. While they are separate concepts, they are also related and the keyword search that we have developed incorporates terms that relate to both. This allows for the often interchangeable (and perhaps incorrect) use of terms relating to either or both of these concepts by publication authors.
The keyword-based approach to identify publications including terms related to sex and/or gender in the title, abstract or keywords was created in an iterative fashion as follows. We captured relevant keywords from: i) keywords from publications within the public Mendeley library "Gender in the Global Research Landscape" [20]; ii) terms used by established organizations and societies, e.g. Gender Identity Research and Education Society, UNICEF and UNESCO; iii) and terms provided by Portia Ltd, the organizer of the Gender Summits. Each keyword was tested individually in a Scopus search of publication titles, abstracts and author/ index keywords for precision and recall and appropriate wildcards, Boolean and proximity operators were identified for each. Some keywords (such as 'man', 'marriage' and 'family') were excluded because their non-specificity resulted in decreased precision without any increase in recall. While it is true that many of those terms that appear in the final query add only incrementally to the sum total of the publications retrieved in Scopus, we were conscious not to oversimplify our query to those terms that did retrieve the majority of the results (such as the single term 'gender', for example). We also consider that all relevant terms are equally valuable to include and that sex and gender terms are used inconsistently and sometimes incorrectly in the literature and our keyword search deliberately conflates them in order to reflect that ambiguity.
We studied years 2015 to 2020, but for the term mapping, publications were limited to those published in 2020, as this is the most recent period for which we had a full data year of articles within Scopus when we executed the analysis, and to peer-reviewed types (i.e., articles, reviews, conference proceedings, short surveys and data articles). In selecting 2020, we are illustrating the analysis that our proposed approach could show and the insights that can be revealed. The final, selected Scopus sex and gender keyword search can be found in Table 1.
For each of the 16 SDGs included in this study, a corpus consisting of the publications identified by queries described recently [18] was created from an analytical copy of the Scopus dataset accessed via ICSR Lab [21], snapshot dated June 1 st 2021. This delivered a unique publication set for each of the 16 SDGs. A second corpus consisting of the publications identified by each SDG keyword search AND the selected sex and gender keyword search was created from the same data. Publications in the first corpus (SDG-related publications) that also appear in the second corpus (sex and/or gender-related publications) were tagged as such after matching using Scopus unique publication identifiers. This tagging was used as the basis for calculating the proportion of each SDG's publications that include those related to sex and/or gender research topics as well as for developing topical maps using VOSviewer.
VOSviewer is "a software tool used for constructing and visualizing bibliometric networks" [22]; the current version at the time of analysis (Version 1.6.18) was employed. This tool uses natural language processing and network mapping techniques to process publication data exported from Scopus for visualization and further analysis. Owing to the processing limits of VOSviewer, where SDG publication sets were too large for mapping they have been randomly down-sampled to approximately 20,000 publications; this applies to all SDG publication sets except for SDG 1, which resulted in fewer than 20,000 publications and so all publications gender � OR "son preference" OR "sexual object � " OR "sex traffick � " OR "non binary" OR "human traffick � " OR "force � marriage � " OR "daughter preference" OR "child rear � " OR "sex � affect � " OR "sex � biodivers � " OR sexing OR male OR female OR childbear � OR {sexes} OR "sexual dimorph � " OR "sex � variat � " OR " OR "reproductive work � " OR "reproductive right � " OR "reproductive health � " OR "sex � violen � " OR "sex � harass � " OR "sex � exploit � " OR "sex � discriminat � " OR mother � OR boy OR girl OR father � OR "sex � trait � " OR "sex � health" OR "sex � behavio � r" OR daughter OR parent � ) OR TITLE-ABS-KEY(biolog � w/3 sex) OR TITLE-ABS-KEY(biomark � w/5 sex) OR TITLE-ABS-KEY(sex w/5 stratif � ) AND DOCTYPE(ar OR re OR cp OR sh OR dp) AND PUBYEAR IS 2020 AND NOT TITLE-ABS-KEY( � engender � ) https://doi.org/10.1371/journal.pone.0275657.t001 were included because that volume falls within the processing power of VOSviewer. In VOSviewer, we applied binary counting of terms, meaning that the presence or absence of a term in a publication was used for determining the occurrence frequencies and term co-occurrence, not the number of occurrences of a term in a publication. We applied a term occurrence threshold of at least 100 occurrences for inclusion in the map across each SDG's publication set with the exception of SDG 1, where the smaller volume of retrieved publications meant that a threshold of at least 50 publications was more appropriate; these thresholds were selected heuristically as a trade-off between comprehensiveness and readability/interpretability of the resulting maps, and other thresholds did not materially alter our observations. VOSviewer's default setting to map 60% of the most relevant terms (based on the calculated relevance score) was selected since "terms with a high relevance score tend to represent specific topics covered by the text data, while terms with a low relevance score tend to be of a general nature and tend not to be representative of any specific topic. By excluding terms with a low relevance score, general terms are filtered out and the focus shifts to more specific and more informative terms. By default, 40% of the terms are excluded based on their relevance score" [23]. To further validate the specificity of the sex and gender keyword search, we carefully examined the results of the approach described above for SDG 5: Gender Equality. We found the expected high degree of overlap between the SDG 5 publication set and those publications within it tagged as also being identified by the sex and gender keyword search.

Most SDGs have a low and steady proportion of publications related to sex and/or gender
The number of research publications in 2020 identified by each SDG query is shown in Table 2, along with the number and proportion of these that were also identified by the sex and gender keyword query. The variation in volume of each of the SDG's corpus is apparent, ranging from SDG 3: Good Health and Well-being (417,443 publications) to the much smaller SDG 1: No Poverty (13,424 publications), to some extent reflecting the disparity in the number and complexity of the targets relating to each goals (e.g. 13 targets for SDG 3 versus 7 targets for SDG 1). Of particular relevance to the present work we note the relatively small number of publications returned by the SDG 5: Gender Equality query (25,601 publications). Most of the SDGs have a low proportion of publications that explicitly relate to sex and gender research topics. SDG 5: Gender Equality and SDG 3: Good Health and Well-being stand out for their high shares (95% and 62%, respectively). Among the remainder, no SDG has a share above 40% and eight are under 10%. This means that most research on SDGs does not include explicit consideration of sex and/or gender: indeed, among the full, deduplicated dataset of 1.6 million publications, 21% of publications explicitly mentioned sex or gender.
SDGs 1-16 with the count of publications in 2020 identified by the SDG query, and the count and proportion of publications of these also found by the sex and gender keyword search. SDGs are ranked descending by this proportion. The gender classifications are those identified within the UN Women report [4].The ranking of the SDGs by proportion of sex and/or gender topical publications does align to some extent with the classification that the report from UN Women gave to the SDG indicator framework [4]. In Figure 2.1 of that report, each SDG was classified as either "gender-sensitive", "gender-sparse" or "gender-blind" to reflect the extent to which the SDG indicators are gender-specific; this classification appears in the final column of Table 2. While the UN Women classification addressed gender (not sex), it is nonetheless clear that the "gender-sensitive" SDGs tend to have the amongst the higher proportions of sex and gender publications, and the "gender-sparse" and "gender-blind" SDGs tend to have the lowest shares of sex and gender publications. However, SDG 10: Reduced Inequalities appears to have a higher than expected proportion of publications on sex and gender given its "gender-sparse" indication, and conversely the "gender-sensitive" SDG 8: Decent Work and Economic Growth has a relatively low proportion. It is important to note that some of the SDGs with low proportions of publications explicitly mentioning sex and gender and which are "gender-sparse" or "gender-blind" under the UN Women classification nevertheless do deal with targets and topics of high sex and gender relevance. For instance, SDG 7: Affordable and Clean Energy indicates a need to transition much of the world's population towards "clean cooking fuels and technologies" which is highly gender-relevant; such nuanced views are more clearly resolved by disaggregating these summary statistics into thematic topics through the use of a term mapping approach.
The results for the single year snapshot (2020) serve as the basis for the next steps in our approach to depict the extent to which sex or gender is considered in the research for each of the 16 SDGs. However, we also wanted to understand how the proportions are changing over time. The SDGs were established in 2015 [8] and so we also calculated the proportions of each SDG's publications that consider sex or gender for the years 2015 to 2020 (Table 3). Although there are publishing lags that may mean that some 2015 papers were written prior to the SDGs being established, we include all years for completeness. Table 3 shows just the proportions for each year; in the online supplementary information section of this paper, a full table including all the publication counts of all publications relating to each SDG and the subset of which relate to sex or gender can be found (Table 1). The results for 2015-2020 (Table 3) show very little change in the proportions of publications that consider sex or gender. At most, the proportion of SDG 3: Good Health and Wellbeing publications referencing sex and gender dropped by 2.0 percentage points between 2019 and 2020. If this change did prove to be meaningful in the longer term, we might speculate that this may be linked to an influx of COVID research in 2020 which did not explicitly make reference to sex or gender, particularly in the early stages of the pandemic response. The only other notable change is for SDG 16: Peace, Justice and Strong Institutions which had a slight increase in the proportion of publications that consider sex or gender, increasing by 1.5 percentage points across the six-year period. For the rest of the SDGs, the changes in proportions fluctuate slightly but remain steady.
SDGs 1-16 with the proportion of publications found by each SDG query and by the sex and gender keyword search for publication years 2015-2020. The gender classifications are those identified within the UN Women report [4]. In the supplementary information accompanying this article, an extended version of this table (Table 2) is available which includes the raw counts of the publications associated with each SDG, and the raw counts of those publications also associated with sex and/or gender.
For the most current view using the most recent full year at the time of conducting the research, we selected 2020 to demonstrate our approach to mapping for insights. Though we do acknowledge that 2020 was not a typical year owing to the onset of the COVID-19 pandemic, we nonetheless believe that small perturbations in the research output and sex and gender focus that year will be minimal and only likely to appear in SDG 3: Good Health and Wellbeing.
The term mapping approach allows us to examine topical clusters in the corpus of research relevant to each SDG, and to overlay this with a view on those clusters which explicitly include terms related to sex and/or gender, which represent publications on sex and gender research topics. The advantage of this approach is that it allows the terms used by the authors of these publications themselves to be examined, rather than relying on any external classification scheme that which may not be sufficiently fine-grained to detect niche topics or emerging research fronts. In the following sections we will present a selection of these SDG maps. In the online S1 File, the maps for all 16 SDGs are presented.

SDG 5: Gender equality has the expected high coverage of sex and gender relevant research
The term map created for SDG 5: Gender Equality is shown in Fig 1, Table 2) between the SDG publication set and those publications within it tagged as also being identified by the sex and gender keyword search, i.e. those that also cover sex and gender research topics. None of the terms visible in the map at this scale are associated with a low proportion of sex and gender tagged publications.

Sex and gender relevant research is unevenly spread across SDG 3: Good Health and Well-being
Given the importance of human health to societies around the world, and the volume of funding made available to address research priorities ranging from formulating evidence-based public health policies to understanding the human genome, the SDG 3: Good Health and Well-being keyword search retrieves an order of magnitude more publications than most Acknowledging the "gender-sensitive" nature of this SDG [4] and the high proportion of publications that are also identified by the sex and gender keyword search (62% of publications per Table 2), the strong overlap of sex and gender in this SDG shown in Fig 4 is to be expected. What is perhaps surprising is that the terms in this map associated with publications that do not explicitly mention sex and/or gender terms are largely in the topical clusters relating to fundamental biology (the cluster on the right side of the map) but also those terms on the left of the map that relate specifically to the COVID-19 pandemic. This is despite calls for sex and gender to be incorporated and reported in research across the spectrum of human health [3,[24][25][26]. Such calls recognise that our understanding of the underlying causes of poor health and effective prevention and treatment requires attention to sex and/or gender disaggregation in studies in the published research literature. In the case of the COVID-19 pandemic-related Binary counting (present/absent, not count of occurrences) was applied to terms in titles and abstracts of 25,601 publications in 2020, and those with at least 100 occurrences were mapped using VOSviewer. Node size indicates count of occurrences, and node proximity reflects frequency of co-occurrence (nodes close together co-occur more frequently than nodes far apart). In this overlay visualization, the color scale indicates the proportion of publications associated with the mapped terms that were also identified by the sex and gender keyword search: blue nodes indicate terms with relatively low consideration of sex and/or gender; yellow terms indicate terms with relatively high consideration of sex and/or gender.
https://doi.org/10.1371/journal.pone.0275657.g002 Binary counting (present/absent, not count of occurrences) was applied to terms in titles and abstracts of 19,983 publications in 2020 (sampled from 417,443 in total), and those with at least 100 occurrences were mapped using VOSviewer. Node size indicates count of occurrences, and node proximity reflects frequency of co-occurrence (nodes close together co-occur more frequently than nodes far apart). In this network visualization, the colors indicate topical clusters.
https://doi.org/10.1371/journal.pone.0275657.g003 Binary counting (present/absent, not count of occurrences) was applied to terms in titles and abstracts of 19,983 publications in 2020 (sampled from 417,443 in total), and those with at least 100 occurrences were mapped using VOSviewer. Node size indicates count of occurrences, and node proximity reflects frequency of co-occurrence (nodes close together co-occur more frequently than nodes far apart). In this overlay visualization, the color scale indicates the proportion of publications associated with the mapped terms that were also identified by the sex and gender keyword search: blue nodes indicate terms with relatively low consideration of sex and/or gender; yellow terms indicate terms with relatively high consideration of sex and/or gender. terms, this apparent under-acknowledgement of the role of sex & gender is despite early evidence that COVID-19 patient outcomes are different for men and women [27].

Sex and gender aspects of SDG 4: Quality education are explicit in education practice but not education policy research
Pedagogy is the theory and practice of teaching and learning, and research relevant to this field is represented in the map for SDG 4: Quality Education shown in Fig 5 by four topical clusters. On the right of the map, the green cluster represents terms dealing primarily with educational settings from early years (preschool and kindergarten) through to high school (secondary education); on the left, the focus is on tertiary education as well as vocational education and labour force outcomes. The blue cluster at the top of the map deals with issues relating to medical education. SDG 4 is classed as "gender-sensitive" by the UN Women report [4] and for many years it has been known that formative educational experiences and educational attainment are different for boys and girls [28]. However, it has also become clear more recently that teacher (professor) gender in higher education affects attainment and outcomes for women but not for men [29]. However, Fig 6 appears to reflect sex and gender elements are included in Binary counting (present/absent, not count of occurrences) was applied to terms in titles and abstracts of 20,030 publications in 2020 (sampled from 37,206 in total), and those with at least 100 occurrences were mapped using VOSviewer. Node size indicates count of occurrences, and node proximity reflects frequency of co-occurrence (nodes close together co-occur more frequently than nodes far apart). In this network visualization, the colors indicate topical clusters.
https://doi.org/10.1371/journal.pone.0275657.g005 publications addressing topics around early years, junior and senior school education (and to a lesser extent in medical education) but is almost absent from those parts of the map dealing with higher education or the effectiveness of classroom teaching.

SDG 13: Climate action has very low coverage of sex and gender relevant research
The final map we examine in detail here is for SDG 13: Climate Action, and Fig 7 illustrates the breadth of topics that this global challenge encompasses. The red cluster in the bottom left covers climate change risk, response and adaptation, while the blue cluster is focussed on carbon, especially carbon capture and storage (perhaps unsurprising given the policy emphasis on carbon sequestration [30]). Linking these is the green cluster, dealing primarily with climate and energy governance and policy. SDG 13 has just a 3% overlap of publications found by both the SDG query and the sex and gender keyword query (and is considered "gender-sparse" according to the UN Women classification [4]). As Fig 8 makes clear, very few terms in this map are associated with a relatively high proportion of sex and gender tagged publications, and these are related mainly to perceptions of and adaptation to climate change. This seems appropriate, since evidence is building that women and men experience climate change effects differently, partly as a result of the intersection between gender, poverty and political engagement [31][32][33][34][35]. Binary counting (present/absent, not count of occurrences) was applied to terms in titles and abstracts of 20,030 publications in 2020 (sampled from 37,206 in total), and those with at least 100 occurrences were mapped using VOSviewer. Node size indicates count of occurrences, and node proximity reflects frequency of co-occurrence (nodes close together co-occur more frequently than nodes far apart). In this overlay visualization, the color scale indicates the proportion of publications associated with the mapped terms that were also identified by the sex and gender keyword search: blue nodes indicate terms with relatively low consideration of sex and/or gender; yellow terms indicate terms with relatively high consideration of sex and/or gender. https://doi.org/10.1371/journal.pone.0275657.g006

Discussion
The approach described here offers a fresh perspective on both the UN SDGs and sex and gender consideration in SDG research by visualizing the topical coverage of the publications in the corpus of each SDG as a term map, and then overlaying that view with the proportion of the publications associated with each term that also explicitly include sex and/or gender terms. In establishing this approach, we want to emphasize that this is not an end-point in of itself: as research evolves and the terms used by authors to describe their work in their publications evolve too, we understand that these keyword queries will need to be revised and updated, perhaps even extended or narrowed as policy and research priorities shift and change.
What we have been able to show with great clarity even with a single year (2020) snapshot is that consideration of sex and gender is uneven across the SDGs ( Table 2), and that even where overlap between the SDG and sex and gender research corpus is high as in SDG 5: Gender Equality, significant topical areas of relevance to the SDG do not consider sex and/or gender (Figs 1 and 2). Furthermore, we have demonstrated that the proportion of publications that consider sex and/or gender is quite steady over time, despite increasing calls for this consideration and even as the SDGs matured. With this, we have demonstrated that there is progress to be made if we are to ensure that women and men benefit equally from achievements that stem from the UN SDGs. However, we acknowledge that the formulation of the SDG goals and targets was not necessarily conducted with reference to sex and gender Binary counting (present/absent, not count of occurrences) was applied to terms in titles and abstracts of 20,030 publications in 2020 (sampled from 42,699 in total), and those with at least 100 occurrences were mapped using VOSviewer. Node size indicates count of occurrences, and node proximity reflects frequency of co-occurrence (nodes close together co-occur more frequently than nodes far apart). In this network visualization, colors indicate topical clusters.
https://doi.org/10.1371/journal.pone.0275657.g007 considerations and that it must not be necessarily expected that the research community has responded to the design of the UN SDGs with targeted sex and gender relevant research.
Importantly, this study lays the groundwork for the evidence-based development of a roadmap toward greater integration of sex and/or gender across research in every SDG, as well as an approach to evaluate change across each SDG over time. Our approach could be used to inform future reports by UN Women [4] to provide a rigorous evidence base to support the inclusion of sex and gender in the formulation of fresh sustainability goals and targets.
Fresh maps and tables can be created each year to monitor sex and/or gender integration progress as we move toward the 2030 SDG target goal. Furthermore, this approach could be adapted to investigate the extent to which animal and human subject studies in SDG research clusters incorporate sex and/or gender disaggregated analyses (also referred to as sex and gender-based analysis, SGBA), drawing on methodology developed for the 2018 She Figures report [36] and included within a study examining the sex and gender-based analysis of Alzheimer's Disease research studies [37]. Binary counting (present/absent, not count of occurrences) was applied to terms in titles and abstracts of 20,030 publications in 2020 (sampled from 42,699 in total), and those with at least 100 occurrences were mapped using VOSviewer. Node size indicates count of occurrences, and node proximity reflects frequency of co-occurrence (nodes close together co-occur more frequently than nodes far apart). In this overlay visualization, the color scale indicates the proportion of publications associated with the mapped terms that were also identified by the sex and gender keyword search: blue nodes indicate terms with relatively low consideration of sex and/or gender; yellow terms indicate terms with relatively high consideration of sex and/or gender.